AITopics | grasp detection

Collaborating Authors

grasp detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Active Perception for Grasp Detection via Neural Graspness Field

Neural Information Processing SystemsFeb-12-2026, 00:23:10 GMT

Real-world experiments show that our method achieves a superior trade-off between grasping performance and time costs.

artificial intelligence, grasp detection, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

Kulshrestha, Manav, Bukhari, S. Talha, Conover, Damon, Bera, Aniket

arXiv.org Artificial IntelligenceNov-11-2025

Robotic grasping is a fundamental capability for autonomous manipulation; however, most existing methods rely on large-scale expert annotations and necessitate retraining to handle new objects. We present VLAD-Grasp, a Vision-Language model Assisted zero-shot approach for Detecting grasps. From a single RGB-D image, our method (1) prompts a large vision-language model to generate a goal image where a straight rod "impales" the object, representing an antipodal grasp, (2) predicts depth and segmentation to lift this generated image into 3D, and (3) aligns generated and observed object point clouds via principal component analysis and correspondence-free optimization to recover an executable grasp pose. Unlike prior work, our approach is training-free and does not rely on curated grasp datasets. Despite this, VLAD-Grasp achieves performance that is competitive with or superior to that of state-of-the-art supervised models on the Cornell and Jacquard datasets. We further demonstrate zero-shot generalization to novel real-world objects on a Franka Research 3 robot, highlighting vision-language foundation models as powerful priors for robotic manipulation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.05791

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.46)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

Lee, Yeonseo, Mun, Jungwook, Shin, Hyosup, Hwang, Guebin, Nam, Junhee, Lee, Taeyeop, Jo, Sungho

arXiv.org Artificial IntelligenceOct-14-2025

Abstract-- Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. Contrastive learning in the A WP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. I. INTRODUCTION Robot grasping represents a fundamental capability in autonomous manipulation systems, enabling robots to interact with objects in diverse environments.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.11036

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)

Add feedback

4364fef031fdf7bfd9d1c9c56b287084-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 00:44:09 GMT

experiment, grasp detection, graspness, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation

Zhang, Haoran, Bai, Shuanghao, Zhou, Wanqi, Zhang, Yuedi, Zhang, Qi, Ding, Pengxiang, Chi, Cheng, Wang, Donglin, Chen, Badong

arXiv.org Artificial IntelligenceOct-8-2025

Robotic grasping is one of the most fundamental tasks in robotic manipulation, and grasp detection/generation has long been the subject of extensive research. Recently, language-driven grasp generation has emerged as a promising direction due to its practical interaction capabilities. However, most existing approaches either lack sufficient reasoning and generalization capabilities or depend on complex modular pipelines. Moreover, current grasp foundation models tend to overemphasize dialog and object semantics, resulting in inferior performance and restriction to single-object grasping. To maintain strong reasoning ability and generalization in cluttered environments, we propose VCoT-Grasp, an end-to-end grasp foundation model that incorporates visual chain-of-thought reasoning to enhance visual understanding for grasp generation. VCoT-Grasp adopts a multi-turn processing paradigm that dynamically focuses on visual inputs while providing interpretable reasoning traces. For training, we refine and introduce a large-scale dataset, VCoT-GraspSet, comprising 167K synthetic images with over 1.36M grasps, as well as 400+ real-world images with more than 1.2K grasps, annotated with intermediate bounding boxes. Extensive experiments on both VCoT-GraspSet and real robot demonstrate that our method significantly improves grasp success rates and generalizes effectively to unseen objects, backgrounds, and distractors. More details can be found at https://zhanghr2001.github.io/VCoT-Grasp.github.io.

artificial intelligence, machine learning, vcot -grasp, (15 more...)

arXiv.org Artificial Intelligence

2510.05827

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping

Bhat, Vineet, Patel, Naman, Krishnamurthy, Prashanth, Karri, Ramesh, Khorrami, Farshad

arXiv.org Artificial IntelligenceAug-26-2025

Robotic manipulation of unseen objects via natural language commands remains challenging. Language driven robotic grasping (LDRG) predicts stable grasp poses from natural language queries and RGB-D images. We propose MapleGrasp, a novel framework that leverages mask-guided feature pooling for efficient vision-language driven grasping. Our two-stage training first predicts segmentation masks from CLIP-based vision-language features. The second stage pools features within these masks to generate pixel-level grasp predictions, improving efficiency, and reducing computation. Incorporating mask pooling results in a 7% improvement over prior approaches on the OCID-VLG benchmark. Furthermore, we introduce RefGraspNet, an open-source dataset eight times larger than existing alternatives, significantly enhancing model generalization for open-vocabulary grasping. MapleGrasp scores a strong grasping accuracy of 89\% when compared with competing methods in the RefGraspNet benchmark. Our method achieves comparable performance to larger Vision-Language-Action models on the LIBERO benchmark, and shows significantly better generalization to unseen tasks. Real-world experiments on a Franka arm demonstrate 73% success rate with unseen objects, surpassing competitive baselines by 11%. Code is provided in our github repository.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.06535

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Segmented Robot Grasping Perception Neural Network for Edge AI

Bröcheler, Casper, Vroom, Thomas, Timmermans, Derrick, Akker, Alan van den, Tang, Guangzhi, Kouzinopoulos, Charalampos S., Möckel, Rico

arXiv.org Artificial IntelligenceAug-18-2025

--Robotic grasping, the ability of robots to reliably secure and manipulate objects of varying shapes, sizes and orientations, is a complex task that requires precise perception and control. Deep neural networks have shown remarkable success in grasp synthesis by learning rich and abstract representations of objects. When deployed at the edge, these models can enable low-latency, low-power inference, making real-time grasping feasible in resource-constrained environments. This work implements Heatmap-Guided Grasp Detection, an end-to-end framework for the detection of 6-Dof grasp poses, on the GAP9 RISC-V System-on-Chip. The model is optimised using hardware-aware techniques, including input dimensionality reduction, model partitioning, and quantisation. Object grasping synthesis is a fundamental challenge in robotics, underpinning applications such as automated warehouse operations, patient assistance in healthcare, and object sorting on assembly lines [1]. While humans excel at grasping objects of various shapes and sizes with precision, replicating this ability in robotics remains challenging.

artificial intelligence, machine learning, requirement, (15 more...)

arXiv.org Artificial Intelligence

2507.1397

Country: Europe > Netherlands (0.14)

Genre: Research Report > Promising Solution (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Simultaneous Pick and Place Detection by Combining SE(3) Diffusion Models with Differential Kinematics

Ko, Tianyi, Ikeda, Takuya, Opra, Balazs, Nishiwaki, Koichi

arXiv.org Artificial IntelligenceAug-6-2025

-- Grasp detection methods typically target the detection of a set of free-floating hand poses that can grasp the object. However, not all of the detected grasp poses are executable due to physical constraints. Even though it is straightforward to filter invalid grasp poses in the post-process, such a two-staged approach is computationally inefficient, especially when the constraint is hard. In this work, we propose an approach to take the following two constraints into account during the grasp detection stage, namely, (i) the picked object must be able to be placed with a predefined configuration without in-hand manipulation (ii) it must be reachable by the robot under the joint limit and collision-avoidance constraints for both pick and place cases. Our key idea is to train an SE(3) grasp diffusion network to estimate the noise in the form of spatial velocity, and constrain the denoising process by a multi-target differential inverse kinematics with an inequality constraint, so that the states are guaranteed to be reachable and placement can be performed without collision. In addition to an improved success ratio, we experimentally confirmed that our approach is more efficient and consistent in computation time compared to a naive two-stage approach. Pick-and-place is one of the most fundamental applications of robots. Despite the significant number of works on generating "pick" poses, limited works focus on simultaneously considering both picking and placing. A single robot arm with a simple hand often leaves no margin for in-hand manipulation or handover capability.

artificial intelligence, diffusion model, grasp pose, (13 more...)

arXiv.org Artificial Intelligence

2504.19502

Country: Asia > Japan (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.82)

Add feedback

GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System

Nguyen, Quang, Le, Tri, Nguyen, Huy, Vo, Thieu, Ta, Tung D., Huang, Baoru, Vu, Minh N., Nguyen, Anh

arXiv.org Artificial IntelligenceJul-22-2025

Language-driven grasp detection has the potential to revolutionize human-robot interaction by allowing robots to understand and execute grasping tasks based on natural language commands. However, existing approaches face two key challenges. First, they often struggle to interpret complex text instructions or operate ineffectively in densely cluttered environments. Second, most methods require a training or finetuning step to adapt to new domains, limiting their generation in real-world applications. In this paper, we introduce GraspMAS, a new multi-agent system framework for language-driven grasp detection. GraspMAS is designed to reason through ambiguities and improve decision-making in real-world scenarios. Our framework consists of three specialized agents: Planner, responsible for strategizing complex queries; Coder, which generates and executes source code; and Observer, which evaluates the outcomes and provides feedback. Intensive experiments on two large-scale datasets demonstrate that our GraspMAS significantly outperforms existing baselines. Additionally, robot experiments conducted in both simulation and real-world settings further validate the effectiveness of our approach. Our project page is available at https://zquang2202.github.io/GraspMAS

artificial intelligence, grasp detection, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.18448

Country: Asia > Japan (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FineGrasp: Towards Robust Grasping for Delicate Objects

Du, Yun, Zhao, Mengao, Lin, Tianwei, Jin, Yiwei, Huang, Chaodong, Su, Zhizhong

arXiv.org Artificial IntelligenceJul-9-2025

Recent advancements in robotic grasping have led to its integration as a core module in many manipulation systems. For instance, language-driven semantic segmentation enables the grasping of any designated object or object part. However, existing methods often struggle to generate feasible grasp poses for small objects or delicate components, potentially causing the entire pipeline to fail. To address this issue, we propose a novel grasping method, FineGrasp, which introduces improvements in three key aspects. First, we introduce multiple network modifications to enhance the ability of to handle delicate regions. Second, we address the issue of label imbalance and propose a refined graspness label normalization strategy. Third, we introduce a new simulated grasp dataset and show that mixed sim-to-real training further improves grasp performance. Experimental results show significant improvements, especially in grasping small objects, and confirm the effectiveness of our system in semantic grasping.

artificial intelligence, grasp pose, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.05978

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback